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1 Introduction 

Ontologies in life sciences, in particular, members of the OBO Foundry [5], 
contain information about species, proteins, chemicals, genomes, pathways, dis- 
eases, etc. Information in these ontologies might overlap, and it is possible that 
a certain concept is defined in different ontologies from a different point of view 
and at different level of granularity. Therefore, the combination of information 
from different ontologies is useful to create a new ontology. 

Case Study The integration will be illustrated with a case study on Toll-like 
receptors. If we want to investigate what kind of information about Toll-like re- 
ceptors is available in Molecule Role Ontology (MoleculeRoleOntlogy) [B], 
then we will see that Toll-like receptors are defined as pattern recognition recep- 
tors. In the Biological Process Ontology (GO) [6] the Toll-like receptors 
are described in the context of signaling pathway and are subsumed by the 
pattern recognition receptor signaling pathway. In the PROTEIN [B] ontology a 
Toll-like receptor is just a protein. In the NCI_Thesaurus fS] ontology Toll-like 
receptors are defined as Cell Surface Receptors. It follows from foregoing that 
multiple ontologies model different aspects of the same concept and the com- 
bination of the available information provides more knowledge about concepts 
where an ontology developer is interested in. 

We introduce an approach for generating a new ontology in which ontologies 
from OBO Foundry are reused. First, we extract modules from these ontolo- 
gies, on the basis of the well defined modularity approach [2]. As a signature 
for the modules we are using the symbols that match the terms of interest as 
indicated by the user. In our case study we create an ontology about Toll-like 
receptors, therefore we use two seed terms (Toll, TLR). Subsequently, we create 
mappings between concepts in the modules. It has already been shown [1] that 
the simple similarity algorithms outperform structural similarity algorithms in 
biomedical ontologies. To this end, we have based our mappings on the similar- 
ity distance [1] between labels and synonyms of classes in the modules. Finally, 
a new ontology is created where the mappings are represented by means of 
OWL:equivalentClass axiom and small concise modules are imported. 
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2 Modules from Enriched Signature 

In our case study we have used the following biomedical ontologies obtained from 
OBO Foundry: National Cancer Institute Ontology (NCI_Thesaurus), GO 
Ontology (GO), Protein Ontology (PRO), Dendritic Cell ontology 
(dendritic_cell). Pathway ontology (pathway). Molecule Role Ontology 
(MoleculeRoleOntology), Gene Regulation Ontology (gene_regulation), 
and finally. Medical Subject Heading ontology (MeSH). All of these ontologies 
are in OBO format, except for the NCI_Thesaurus, which is in OWL format. 

A module comprises knowledge of a part of the domain that is dedicated to 
a set of terms of user interest {seed terms). Let Ti — {Toll,TLR} be this set. 
Let Si be a set of terms (signature) from the ontology Oi that represents the 
classes whose labels, descriptions, ID, or other annotation properties contain the 
symbols from Ti. The first module that we have extracted is the module from 
NCI_Thesaurus Ml. This is chosen because it is the largest ontology containing 
the most matches. In order to generate a signature for the next ontology O2, we 
are using not only the terms from Ti but we enrich this set with the terms from 
the module Mi. The same procedure is applied for the rest of the ontologies, 
namely module Mi is extracted on the basis of the terms Ti — Sig{Mi-i)L)Ti-i. 
This method has two drawbacks. First, it depends on the order of ontologies. 
Second, with the generation of the new module Mi new symbols can be intro- 
duced that will match symbols from ontologies used in previous steps. These 
problems can be solved with the generation of a fixpoint. 

Fixpoint Modules We have investigated whether or not we will find a fixpoint 
with our module extraction method. The fixpoint is reached at the moment the 
set of terms T which is used in order to generate modules during step U does not 
change any more after another run with all ontologies. This can be written as 
Ufc=i'S'«5(Mfc_i) = \J''^^^Sig{Mk,i+i), where Mk^i is the module k created during 
step ti. It can be formulated in a " fixpoint-like" way Match{T) = T. 

The fixpoint was reached with the following sizes of the modules, see Table [TJ 



3 Ontology Mapping 

In this paper we use a more loosely definition of the concept mapping compared 
with the definition given in [3! in which mapping is a morphism. In our approach 
m,apping is a partial function that maps from subset Si C Sig{Oi) to subset 
S2 C Sig(02). We deliberatively reject the morphism requirement, thus, the 
structural dependencies will not be preserved after mapping, because we are 
interested in consequents of this mapping to the original ontologies, namely, 
whether and how the structural dependences will be broken. 

For our experimental prototype system we use our own mappings based on 
the syntactic similarity. It has been already shown [T] that in the case of biomed- 
ical ontologies the simple mappings methods are sufficient and outperform more 
complex methods. 
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Table 1: The size of the modules after reaching the fixpoint 



module 


size in KB 


TolLfrom_gene_regulation 


88.7 


TolLfrom_protein 


23.4 


TolLfrom_chebi 


218.6 


TolLfrom_mesh 


59.2 


TolLfrom_dendritic_cell 


4.2 


TolLfrom_pathway 


4.1 


TolLfrom_cellular .component 


35.4 


TolLfrom_molecular .function 


11.4 


TolLfrom_MoleculeRoleOntolof 


ry 46.9 


TolLfrom_biologicaLprocess 


221.1 


TolLfrom_Thesaurus 


802.1 



We compare characteristics (id, label, description) for all classes from on- 
tology Oi with the same characteristics for all classes from ontology O2. The 
comparison is based on the Levenshtein distance algorithm ^ . We have adapted 
the Levenshtein distance and introduce a metric Lev (in the range [0 ... 1]). Two 
classes Ci and Cj are considered to be similar if they have the maximum value 
for Lev metric and if this value is also higher than the threshold t = 0.95 that 
was experimentally determined. 



4 Integration Information from Ontologies 

The final step of the ontology creation is the integration of the modules into 
one ontology. If there a mapping exists between two classes Ci and Cj from the 
modules Mi and Mj respectively we add the equivalence relation 
OWL:equivalentClass between these classes in the new ontology. Besides the 
equivalence relationships the new ontology contains the OWL: imports axioms, 
where all the created modules are imported. 

So far, this all seems rather straightforward. However, the problem with 
this integrated ontology Oi...n is that it contains many unsatisfiable classes. In 
order to understand the reason of this unsatisfiability we have applied different 
experiments. First, we have merged all pairs of the modules, namely Vi^jO^.j = 
MiUMj. For each merged ontology Oi,j we have checked for unsatisfiable classes. 
Already at this stage of integration different merged pairs contain unsatisfiable 
classes. We have used the Pellet reasoner in order to reveal the explanations 
of unsatisfiability. After we have repaired unsatisfiable classes in the merged 
pairs of ontologies Oij we have had to check satisfiability of the integrated 
ontology O There were still 46 unsatisfiable classes. The unsatisfiabilities 
in the integrated ontology have also been solved by means of Pellet reasoner 
explanations. 
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5 Conclusion 

We have described a method to generate a new ontology on the basis of the 
bio-ontologies most of which are available in OBO Foundry. We have shown 
how to create modules on the basis of the terms of interest. The signature for 
the module extraction is enriched by the symbols from other modules with the 
fixpoint as a stop criterion. We have integrated modules on the basis of mappings 
created using Levenshtein distance similarity. 

We have investigated how to solve unsatisfiable classes which appear after 
the integration of the modules. Although the number of unsatisfiable classes 
was high, it was possible to solve unsatisfiabilities with the help of explanations 
provided by the Pellet reasoner. 

In this study we have shown that the modularity and simple mappings 
provide a good foundation for the creation of a new ontology in an pseudo- 
automated way. This method can be used when an ontology engineer does not 
want to create a new ontology from scratch, but rather wants to reuse knowledge 
already presented in other ontologies. Moreover, this is the strategy that should 
be preferred and has to be applied more often as ontologies gain importance in 
life sciences. 
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